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BACKGROUND AND HISTORY OF PESTICIDAL 
CRYSTAL PROTEIN NOMENCLATURE 

Since the first cloning of an insecticidal crystal protein gene 
from Bacillus thuringiensis (91), many other such genes have 
been isolated. Initially, each newly characterized gene or pro- 
tein received an arbitrary designation from its discoverers: icp 
(64); cry (21, 121); kurhdl (31); Bta (88); btl, bt2, etc. (40); 
type B and type C (43); and 4.5 kb, 5.3 kb, and 6.6 kb (55). The 
first systematic attempt to organize the genetic nomenclature 
relied on the insecticidal activities of crystal proteins for the 
primary ranking of their corresponding genes (44). The cryl 
genes encoded proteins toxic to lepidopterans; cryll genes en- 
coded proteins toxic to both lepidopterans and dipterans; crylll 
genes encoded proteins toxic to coleopterans; and crylV genes 
encoded proteins toxic to dipterans alone. 

This system provided a useful framework for classifying the 
ever-expanding set of known genes. Inconsistencies existed in 
the original scheme, however, due to attempts to accommo- 
date genes that were highly homologous to known genes but 
did not encode a toxin with a similar insecticidal spectrum. The 
cryllB gene, for example, received a place in the lepidopteran- 
dipteran class with cryllA, even though toxicity against dipter- 
ans could not be demonstrated for the toxin designated 
CryllB. Other anomalies arose after the nomenclature was 
established. The protein named CrylC, for example, was re- 
ported to be toxic to both dipterans and lepidopterans (103), 
while the protein designated CrylB was reported to be toxic to 
both lepidopterans and coleopterans (8). Because the nomen- 
clature system provided no central committee or database to 
maintain standardization, new genes encoding a diverse set of 
proteins without a common insecticidal activity each received 
the name cryV, based on the next available Roman numeral 
(32, 46, 67, 100, 102, 108). 

PROPOSED NOMENCLATURE 

We propose in this review a revised nomenclature for the cry 
and cyt genes. To organize the wealth of data produced by 
genomic sequencing efforts, a new nomenclatural paradigm is 
emerging, exemplified by the internationally recognized cyto- 
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chrome P-450 superfamiry nomenclature system (68a, 122a). 
Our proposal conforms closely to this model both in concep- 
tual basis and in nomenclature format. The underlying basis of 
this type of system is to assign names to members of gene 
superfamilies according to their degree of evolutionary diver- 
gence as estimated by phylogenetic tree algorithms. The no- 
menclature format in such a system is designed to convey rich 
informational content about these relationships by appending 
to the mnemonic root a series of numerals and letters assigned 
in a hierarchical fashion to indicate degrees of phylogenetic 
divergence. This change from a function-based to a sequence- 
based nomenclature allows closely related toxins to be ranked 
together and removes the necessity for researchers to bioassay 
each new protein against a growing series of organisms before 
assigning it a name. 

In our proposed revision, Roman numerals have been ex- 
changed for Arabic numerals in the primary rank (e.g., 
Cryl Aa) to better accommodate the large number of expected 
new proteins. The mnemonic Cyt to designate crystal proteins 
showing a general cytolytic activity in vitro has been retained 
because of its historical precedent and entrenchment in the 
research literature. Our definition of a Cry protein is rather 
broad: a parasporal inclusion (crystal) protein from B. thurin- 
giensis that exhibits some experimentally verifiable toxic effect 
to a target organism, or any protein that has obvious sequence 
similarity to a known Cry protein. Similarly, Cyt denotes a 
parasporal inclusion (crystal) protein from B. thuringiensis that 
exhibits hemolytic activity, or any protein that has obvious 
sequence similarity to a known Cyt protein. By these criteria, 
the nontoxic 40-kDa crystal protein from B. thuringiensis subsp. 
thompsoni, for example, has been excluded from our list, but 
the lepidopteran-active 34-kDa protein (now Cry 15 A) en- 
coded by an adjacent gene has been included (11). 

The freely available software applications CLUSTAL W 
(110) and PHYLIP (27) define the sequence relationships 
among the toxins to form the framework of the new nomen- 
clature. In the first step, CLUSTAL W aligns the deduced 
amino acid sequences of the full-length toxins and produces a 
distance matrix, quantitating the sequence similarities among 
the set of toxins. CLUSTAL W default settings are employed, 
except that the "delay divergent sequences" setting in the mul- 
tiple-alignment parameter menu is reduced from 40 to 0%. 
The NEIGHBOR application within the PHYLIP package 
then constructs a phylogenetic tree from the distance matrix by 
an unweighted pair-group method using arithmetic averages 
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(UPGMA) algorithm. The TREEVIEW application (73), with 
the "phylogenetic tree" and "ladderize left" options selected, 
produces a graphic presentation of the resulting tree. 

We have applied this procedure to the set of holotype se- 
quences given in Table 1 to produce the phylogenetic tree 
presented in Fig. 1. Vertical lines drawn through the tree show 
the boundaries used to define the various nomenclatural ranks. 
The name given to any particular toxin depends on the location 
of the node where the toxin enters the tree relative to these 
boundaries. A new toxin that joins the tree to the left of the 
leftmost boundary will be assigned a new primary rank (an 
Arabic number). A toxin that enters the tree between the left 
and central boundaries will be assigned a new secondary rank 
(an uppercase letter). It will have the same primary rank as the 
other toxins within that cluster. A toxin that enters the tree 
between the central and right boundaries will be assigned a 
new tertiary rank (a lowercase letter). Finally, a toxin that joins 
the tree to the right of the rightmost boundary will be assigned 
a new quaternary rank (another Arabic number). Toxins with 
identical sequences but isolated independently will receive sep- 
arate quaternary ranks. 

By this method each toxin will be assigned a unique name 
incorporating all four ranks. A completely novel toxin would 
currently be assigned the name Cry23Aal. For the sake of 
convenience, however, we propose that the inclusion of the 
tertiary rank a and quaternary rank 1 be optional, their use 
dictated only by a need for clarity. This new toxin could there- 
fore simply be referred to as Cry23A. 

In choosing locations for rank boundaries, we attempted to 
construct a nomenclature reflecting significant evolutionary 
relationships while at the same time minimizing changes from 
the gene names assigned under the old system. In the resulting 
system, proteins with a common primary rank are similar 
enough that the percent identity can be defined with some 
confidence. Proteins with the same primary rank often affect 
the same order of insect; those with different secondary and 
tertiary ranks may have altered potency and targeting within an 
order. At the tertiary rank, differences can be due to the ac- 
cumulation of dispersed point mutations, but often they appear 
to have resulted from ancestral recombination events between 
genes differing at a lower rank level (9). The quaternary rank 
was established to group "alleles" of genes coding for known 
toxins that differ only slightly, either because of a few muta- 
tional changes or an imprecision in sequencing. To avoid con- 
fusion, however, the reader should bear in mind the differences 
between the quaternary rank number and the classical concept 
of the allele. Any cry gene specified with a quaternary rank is 
a natural isolate. No assumption about functionality is implied 
by the presence of this rank number in the gene name. In 
contrast, an allele number would be assumed, unless paren- 
thetical or subscripted information indicated otherwise, to de- 
note a nonfunctional mutant form of a wild-type gene found at 
a discrete genetic locus. Because of the somewhat modular 
nature of the Cry proteins and the effect that various segmental 
relationships could have on the clustering algorithm, it is likely 
that these boundaries will move slightly or even bend as the 
addition of new sequences changes the topology of the phylo- 
genetic tree. Currently the boundaries represent approxi- 
mately 95, 78, and 45% sequence identity. 

A B. thuringiensis Pesticidal Crystal Protein Nomenclature 
Committee, consisting of the authors of this paper, will remain 
as a standing committee of the Bacillus Genetic Stock Center 
(BGSC) to assist workers in the field of B. thuringiensis genetics 
in assigning names to new Cry and Cyt toxins. The correspond- 
ing gene or protein sequences must first be deposited into a 
publicly accessible database (GenBank, EMBL, or PIR) and 



released by the repository for electronic publication in the 
database so that the scientific community may conduct an 
independent analysis. Researchers should submit new se- 
quences directly to the BGSC director (D. R. Zeigler), either 
by electronic mail (zeigler.l@osu.edu) or on computer dis- 
kette. The director will analyze the amino acid sequence as 
described above and suggest the appropriate name, subject to 
the approval of the committee. The committee will periodically 
review the literature of the Cry and Cyt toxins and publish a 
comprehensive list. This list, alongside other relevant informa- 
tion, will also be available via the Internet at the following 
URL: http://www.biols.susx.ac.uk/Home/Neil_Crickmore/Bt/. 

The current list of cry and cyt genes (including quaternary 
ranks) is given in Table 1. New gene names are listed with their 
previous names, their GenBank accession numbers, and pub- 
lished references. The quaternary ranks were assigned in the 
order that the gene sequences were discovered in the literature 
or submitted to the committee. Genes assigned the quaternary 
rank 1 represent holotype sequences. 

The boundaries shown in Fig. 1 allow most cry genes to 
retain the names they received under the system of Hofte and 
Whiteley (44), after a substitution of Arabic for Roman nu- 
merals. There are a few notable exceptions: crylG becomes 
cry9A, crylHC becomes cry7Aa, crylllD becomes cry3C, cryJVC 
becomes crylOA, crylVD becomes cryllA, cytA becomes cytlA, 
and cytB becomes cytlA (Table 1). Under the revised system, 
the known Cry and Cyt proteins fall into 24 sets at the primary 
rank— Cytl, Cyt2, and Cryl through Cry22. 

ROBUSTNESS OF THE NOMENCLATURE 

The robustness of the current naming process was assessed 
by a number of additional analyses. The choice of clustering 
algorithm (unweighted pair-group method using arithmetic av- 
erages) was driven largely by the consistent location of a root 
and constant branch lengths, resulting in a common vertical 
alignment of sequence names and essentially allowing a "ruler 
across the tree" approach to naming. It has the drawback of 
imposing a common evolutionary clock on the clustering pro- 
cess, an assumption that cannot be assured. The distance met- 
ric related to percent identity (essentially 1 minus the fraction 
of identical residues of the total compared without gaps) is the 
one most commonly found as the output of sequence compar- 
ison programs, including CLUSTAL W. For phylogenetic anal- 
ysis, a more usual distance metric relates to the number of 
substitutions per site to convert one sequence to the other 
(e.g., DayhofFs point accepted mutation [PAM]) and accounts 
for the possibility of multiple substitutions per site as the se- 
quences are more divergent. The latter method has the draw- 
back of being more computationally intensive, and, for very 
divergent sequences, requiring too large a value, resulting in 
numeric computation failures. They also differ in the way se- 
quences of unequal length are handled, with the percent iden- 
tity method typically ignoring excess sequence and the other 
methods assigning a penalty. This is particularly important for 
crystal proteins, since a number of them lack the C-terminal 
protoxin segments yet are quite related to some longer toxins 
in the N-terminal toxin segment; we feel that the stronger 
association of such relationships found by the percent identity 
method is preferred. 

To assess the effect of using the neighbor-joining method to 
generate an unrooted tree, CLUSTAL W routines were used 
to generate such a tree with 1,000 bootstraps of the sequence 
alignment we used for Fig. 1. When an appropriate outgroup 
was chosen, the resulting tree (not shown) resembled our Fig. 
1. The bootstrap values indicated that the tree thus generated 
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TABLE 1, Known cry and cyt gene sequences with revised nomenclature assignments 
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* The symbols < and > indicate that the coding region extends up- or downstream, respectively, from the known sequence data. 
b Only the polypeptide sequence has been reported. 
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FIG. 1. Phylogram demonstrating amino acid sequence identity among Cry and Cyt proteins. This phylogenetic tree is modified from a TREEVIEW visualization 
of NEIGHBOR treatment of a CLUSTAL W multiple alignment and distance matrix of the full-length toxin sequences, as described in the text. The gray vertical bars 
demarcate the four levels of nomenclature ranks. Based on the low percentage of identical residues and the absence of any conserved sequence blocks in 
multiple-sequence alignments, the lower four lineages are not treated as part of the main toxin family, and their nodes have been replaced with dashed horizontal lines 
in this figure. 



Main Cry 
Lineage 



had significant branch points deeper in the tree than the cho- 
sen primary rank in the nomenclature. This sort of analysis was 
rejected as unsuitable for the purposes of Cry nomenclature 
due to the generally ragged branch lengths it produced and the 
requirement for the careful choice of an outgroup. 
An alternative method of clustering protein sequences, ca- 



pable of handling sequences that are quite diverse, is parsi- 
mony analysis. A consensus tree generated from 100 boot- 
straps of such an analysis displaces the two incomplete Cryl 
sequences (CrylBd and CrylAf) and the two Cryl sequences 
lacking the C-terminal protoxin segments (Crylla and Cryllb) 
into a region of the tree populated with such shortened se- 
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quences (not shown). With the further exceptions of Cryl2A 
being interjected into the Cry5 cluster and a number of se- 
quences besides Cry6B clustering higher in the tree than 
Cry6A, the proposed nomenclature successfully reflects the 
grouping of sequences provided by this method of analysis as 
well. 

As noted above, the usual distance metrics for phylogenetic 
analysis account for multiple substitutions per site; most com- 
monly, the Dayhoff PAM metric is used. When this distance 
metric was applied to the alignment used to make Fig. 1, a 
large number of the sequence pairs were found to have infinite 
distance. Therefore, the main Cry lineage and the Cyt lineage 
were separately aligned, the distances were calculated, and the 
distance matrices were clustered by using the FITCH program 
(of the PHYLIP software package). This method of analysis 
revealed several strongly associated groups of sequences 
(>90% of trees) in the main Cry lineage that extend deeper 
into the tree than the primary rank assigned in the proposed 
nomenclature: Cryl; Cry3; Cry4; Cry7; the Cry5, Cryl2-Cryl3- 
Cryl4-Cry21 group; the Cry8-Cry9 group; the Cryl0-Cryl9 
group; the Cryl6-Cryl7 group; and the Cry2-Cryll-Cryl8 
group. Many of these groups, however, were separated by 
branch points that were either nonmajority or were found 
<60% of the time; thus, the arrangement of these groups 
would be likely to change with additional sequence additions. 
At the secondary rank, the only anomaly with respect to the 
proposed nomenclature was the interjection of the Crylla and 
Cryllb sequences into the CrylB group. This effect may be due 
to an artificially reduced distance between the Cryll sequences 
and the incomplete CrylBd sequence caused by the particular 
distance metric used. The Cyt lineage sequences were sepa- 
rated into the expected two primary rank groups that separate 
into the expected secondary rank groupings. This more stan- 
dard phylogenetic approach also suffers from an accentuated 
visual disorientation of uneven branch lengths and shortening 
of the more closely related branches, especially at the tertiary 
rank (lowercase letter), where a great deal of comparative 
work has been done among the Cryl toxins. 

In summary, the proposed nomenclature uses readily avail- 
able software that can be easily interpreted by investigators in 
the field and meets their needs as well as, or better than, 
alternative methods of analysis and presentation. When the 
holotype toxins were analyzed by alternative phylogenetic 
methods, the hierarchy implied by the nomenclature was es- 
sentially consistent with the resulting phylogenetic clustering, 
and the few exceptions were largely explainable by known 
properties of the sequences in question. 
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The structure of the 5 -endotoxin from Bacillus 
thuringiensis subsp. tenebrionis that is specifically 
toxic to Coleopterajnsects (beetle toxin) has been 
determined at 2.5 A resolution. It comprises three 
domains which are, from the N- to C-termini, a 
seven-helix bundle, a three-sheet domain, and a p 
sandwich. The core of the molecule encompassing 
all the domain interfaces is built from conserved 
sequence segments of the active 5 -endotoxins. 
Therefore the structure represents the general fold 
of this family of insecticidal proteins. The bundle 
of long, hydrophobic and amphipathic helices is 
equipped for pore formation in the insect mem- 
brane, and regions of the three-sheet domain are 
probably responsible for receptor binding. 



The 5-endotoxins are a family of insecticidal proteins produced 
by Bacillus thuringiensis (B.t.) during sporulation, having relative 
molecular masses (M r ) 60,000-70,000 (60K-70K) in the active 
form and specific toxicities against insects in the orders of 
Lepidoptera, Diptera and Coleoptera 1 - 2 . These toxins have been 
formulated into commercial insecticides for three decades 3 , and 
now insect-resistant plants are engineered by transformation 
with Lepidoptera-specific toxin genes 4 " 6 . In the bacterium 8- 
endotoxins are synthesized as protoxins of M r s 70K-135K and 
crystallize as a parasporal inclusion ~ 1 /t in size, in which form 
they are ingested by the susceptible insect. The microcrystal 
dissolves in the alkaline pH of the midgut and the protoxin is 
cleaved by gut proteases to release the active toxin. 5-Endotoxins 
activated in vitro bind specifically and with high affinity (fc D « 
0.1-20 nM) to protein receptors on brush-border membrane 
vesicles derived from the gut epithelium of target insects 7 " 9 and 
create leakage channels of 10-20 A diameter in the cell mem- 
brane 10 . In vivo such membrane lesions lead to swelling and 
lysis of the gut epithelium 11 and death of the insect ensues 
through starvation and septicaemia. Active 5-endotoxins of 
different specificities show five strongly conserved regions in 
their amino-acid sequences 1 * 12 . Exchanging sequence segments 
in the divergent regions between toxins of different specificities 
can produce active hybrids showing altered target 
specificity 13 " 15 . We have determined the atomic structure of a 
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Coleoptera-specific S-endotoxin (CrylllA, beetle toxin) from 
B.t. subsp. tenebrionis 161 * to elucidate the structural basis for 
target specificity and membrane perforation by this family of 
proteins. 

Structure determination 

Parasporal crystals of the beetle toxin contain the full-length 
644- residue protoxin 17 as the minor component, and a product 
of bacterial processing with 57 residues removed from the N- 
terminus as the major component 19 . The latter (M r 67K) is 
similar in sequence to the active form of other 6-endotoxins. 
After solubilization, papain cleavage converts the mixture to the 
67K toxin (see legend to Table 1). This was recrystallized in the 
original crystal form of the parasporal crystals, space group 
C222 x and cell dimensions 1 17.1 by 134.2 by 104.5 A, containing 
one molecule per asymmetric unit and 55% solvent by volume 18 . 

Initial evaluation of derivatives was carried out at 4.5 A resol- 
ution with data collected on the FAST TV diffractometer 20 using 
CuKa radiation. Complete datasets (Table 1) were then collec- 
ted to 2.5 A resolution from native crystals using the imaging 
plate systems at the EMBL outstation at DESY and from the 
mercury and platinum derivatives on film at SRS Daresbury. 
The electron density map (Fig. 1) at 2.5 A resolution calculated 
with phases from multiple isomorphous replacement (mean 
figure of merit, 0.63) was easily interpretable and was improved 
by solvent flattening 21,22 . A continuous polypeptide chain from 
residue 61 to residue 644 at the C terminus was traced unam- 
biguously, and most side-chain atoms could be located in the 
map. The atomic model was built using the graphics program 
O (ref. 23) and had an initial K-factor of 37% for all data to 
2.5 A. After preliminary refinement using the program X-PLOR 
(ref. 24), the current model, containing 584 amino acid residues 
and 40 bound water molecules, has an R -factor of 19.9% and 
r.m.s. bond length deviation of 0.017 A. 

Description of the structure 

Overview. The beetle toxin is a wedge-shaped molecule with a 
radius of gyration of 58 A. As shown in Fig. 2a y it comprises 
three domains. Domain I, from the N terminus of the 67K toxin 
to residue 290, is a seven-helix bundle in which a central helix 
is completely surrounded by six outer helices tilted at about 
+20° to it (Fig. 3f>,c). Domain II, from residues 291 to 500, 
contains three antiparallel /3 sheets packed around a hydro- 
phobic core with a triangular cross-section (Fig. 4). Domain III, 
from residues 501 to 644 at the C terminus is a sandwich of two 
antiparallel /3 sheets (Fig, 5). Domains I and III make up the 
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TABLE 1 Data collection and phasing statistics 



Data collection 





Method of 




Number of Resolution 


Number of 


Unique reflections 




Data 


collection 




crystals (A) 


measurements 


{% completeness) 


D 

merge 


Native 


image plate 




8 2.5 


121.767 


27.727 (100) 


0.108 


CH 3 HgN0 3 
Hg(CH 3 C00) 2 


film 




7 2.5 


103.623 


27.767 (100) 


0.095 


film 




5 2.5 


60.224 


25.919 (94.5) 


0.103 


c/s-Pt(NH 3 ) 2 CI 2 


film 




7 2.5 


86.629 


25.924 (94.5) 


0.107 


K 2 Os0 4 


FAST 




1 4.5 


21.143 


4.680 (100) 


0.077 


HoCI 3 


FAST 




1 4.5 


20.013 


4.701 (100) 


0.069 


hasing statistics 














Derivative 












Phasing powers 


Anomalous data Number of sites 


Alter iv T 


ficoMisT 


(resolution. A) 


CH 3 HgN0 3 




no 


3 


0.183 


0.715 


1.56 (2.5) 


Hg(CH3C0O) 2 




yes 


6 


0.247 


0.609 


2.28 (2.5) 


c/s-Pt(NH 3 ) 2 Cl 2 




no 


5 


0.185 


0.682 


1.54 (2.5) 


K 2 Os0 4 




no 


4 


0.149 


0.757 


1.26 (5.5) 


H0CI3 




no 


3 


0.095 


0741 


1.35 (5.0) 



Protein preparation: Solubilized parasporal crystals from BX subsp tenebrionis were incubated at O.Smgmf" 1 protein with 0.125 units per ml of 
Agarose-linked papain (Boehringer) in 3.3 M NaBr. 0.05 M sodium phosphate. pH7.0. and 0.1 mgml 1 phenyl me thy Isulphonylfluoride (PMSF) for 30min at 
20 °C. Digestion was stopped by adding tosyl lysinechloromethylketone (TLCK) to 0.125 mgm 1 and Na 2 C0 3 to one fifth volume and removing the 
enzyme-beads. The 67K beetle toxin was then purified by gel filtration on Sephadex G75 equilibrated with 0.1 M NaHC0 3 . pH 10.5. 0.5 M NaBr. Crystallization: 
Single crystals were obtained by microdialysis at a protein concentration of 2.5 mg ml" 1 against 0.1 M NaHC0 3 , pH 9.5. 1.2 M NaBr at 4 °C overnight, then 
against 0,1 M NaHC0 3 , pH 9.2. 0.5 M NaBr at 16 °C; 3 mM NaN 3 , 0.1 mM PMSF and 0.1 mg ml _ 1 TLCK were present in all buffers. Crystals were transferred 
by stages to 0.05 M 2-(A^-morpholino)ethanesulphonic acid (MES). pH 6.5. for derivative preparation and mounted in 0.03% low-melting agarose in this buffer 
during data collection. Data collection: Image plate and film data were processed using MOSFLM (Imperial College. London) and CCP4 programs (Daresbury. 
UK). FAST (ref. 20) data were collected and processed with MADNES* 5 , and scaled in 3° batches. Derivatives: Crystals were soaked respectively in 0.25 mM 
CH 3 HgN0 3 for 3.5 h, in 1 mM Hg(CH 3 C00) 2 for 14 h. in freshly prepared 1 mM c/s-Pt(NH 3 ) 2 CI 2 for 21 h. in saturated K 2 0s0 4 for 35 h, and in 2 mM HoCI 3 for 
3 days. Phase calculation: Two heavy-atom sites in each derivative were located from difference Patterson functions, except in the case of Hg(CH 3 C00) 2 
for which 3 sites were located, and the remaining sites were found by cross-phased difference Fouriers. Heavy-atom parameters were refined against 
centric data and phases calculated for all data using the program PHARE (G. Bricogne). The two low-resolution derivatives were refined against phases 
calculated from the high-resolution derivatives. Phasing with the three high-resolution derivatives gave an overall figure of merit of 0.61 (25-2.5 A) and a 
clearly interpretable map. Including the remaining derivatives slightly improved the connectivity of the map (overall figure of merit 0.63). and four cycles of 
solvent flattening using a 50% solvent content and a 9 A radius in mask calculation 21 27 improved the overall definition of densities. The starting model 
was built using the program 0 (ref. 23) with the Bones option for main-chain tracing and the autobuild and manip options for side chains. Refinement by 
simulated annealing using the program X-PLOR (ref. 24) reduced the /?-factor from 0.37 to 0.25 without individual 0-factors. and to 0.23 with restrained 
individual 8-factors. The model was adjusted in the loops 154-156. 429-436. and 483-488. and had 40 solvent molecules added, then refined by X-PLOR 
again. The current model has an fl-factor of 19.9%. with r.m.s. bond length deviation of 0.017 A. r.m.s. bond angle deviation of 3.2°. and average atomic 
B-factor of 18 A 2 . 

* Emerge = 1 L !'/-</>]/£ KOI where li are intensity measurements for a reflection, and </) is the mean intensity for this reflection. 

* Adehv^E \ f ph~ f pUI. \Fpl where F^ is the structure factor amplitude of the derivative crystal and F P is that of the native. 

* Rom* ~ I I I>>h ± />l - F H (calc)|/£ l^pH - F P \. where F P and F^ are defined as for . and FJcalc) is the calculated heavy-atom structure factor amplitude 
summed over centric data only. 

§ Phasing power ~{F H )/E. the r.m.s. heavy-atom structure factor amplitudes divided by the residual lack of closure error. 



bulky end of the molecule. Through their contact one of the 
two p sheets in domain III is almost entirely buried. To our 
knowledge (see, for example, ref. 25), the packing of helices in 
domain I and of sheets in domain II are both novel arrange- 
ments. 

Domain I. The central helix in this seven-helix bundle is a 5 (Fig. 
3fe,c), which is oriented with its C terminus towards the bulky 
end of the molecule. Viewed from this end, the outer helices 
are arranged anticlockwise in the order of a lf a 2 , a 3 , a 4 , a 6 
and a 7 , with helices a, and a 7 adjacent to the /3-sheet domains; 
a 2 is interrupted by a non-helical section and only the leading 
half, a 2a , is packed against a 5 . Figure 3a shows the alignment 
of amino-acid sequence on the surfaces of the helices. The 
helices are long, especially a 3 to a 7 , which contain respectively 
8, 7, 6, 9 and 7 complete helical turns and hence would be long 
enough to span the 30-A thick hydrophobic region of a mem- 
brane bilayer. Furthermore, the six outer helices bear a strip of 
hydrophobic residues (defined by AG^O for transfer from oil 
to water) down their entire length on the side-facing helix <* 5 , 
so they are amphipathic. In keeping with the general observation 
that secondary structures are close-packed and bury hydro- 
phobic surfaces 26 , the helix contact angles in this domain cluster 
around +20° rather than -50°, giving the bundle a bouquet-like 
appearance (Fig. 3b). Figure 3c shows the bundle in cross- 
section. The interhelical space contains 27 aromatic residues 
which are packed in the edge-to-face fashion 27 ; alt polar groups 
in this region are hydrogen-bonded or in salt bridges. 
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The concentric arrangement of the seven-helix bundle is dis- 
tinct from the two-layered type seen in bacteriorhodopsin. There 
is some resemblance to the pore-forming domain of colicin A 28 , 
in which two hydrophobic helices are shielded from solvent by 
eight amphtphilic helices, but the colicin helices are generally 
shorter. Like the colicin helices, the bundle in the beetle toxin 
may be a soluble form of packaging for the hydrophobic and 
amphiphilic helices that will form pores in the membrane after 
a large change in conformation. 

Domain II. In Fig. 4a and 46 the three sheets of this domain are 
laid side-by-side, as they would be seen from the solvent. There 
is an apparent structural duplication between the four-stranded 
antiparallel sheets, sheet 1 and sheet 2. The chain connections, 
04. 03* 02. 0 5 and j3 8 , 0 7 , f3 6 , £ 9 , respectively, follow the order 
of +3, - 1 , - 1 , +3, which is typical of the 'Greek-key' topology 29 . 
From both sheets the inner strands, p y and p 2 as well as p 7 and 
/3 6 , extend some 20 A to the apex of the molecule as two- 
stranded p ribbons; and at the point of departure from the 
sheets there is a /3-bulge in /3 3 and in )3 7 to twist the plane of 
the ribbon by nearly 90° relative to the sheet. The connections 
between the outer strands cross over the ribbons on the solvent 
side. 

The pseudo-symmetry between these sheets is very approxi- 
mate. Using the least squares option in O (ref. 23), the sheet 
region of the strands p 3 and p 2 can be brought to superimpose 
on that of p 7 and /? 6 , with a r.m.s. fit of 0.72 A for 13 a carbons. 
But the r.m.s. fit increased to 1.1 A for 23 a carbons of the 
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FIG. 1 Electron density map in the neighbourhood 
of Cys 243, calculated a. using combined phases 46 
—from.multipleJsonrrorphousreplacement-and-sol- 
vent flattening, and b. using combined experi- 
mental and model phases 46 after refinement by 
X-PLOR. The refined structure Is shown superim- 
posed for reference. Although Cys 243 is a major 
site of both the methylmercury (MM) and mercuric 
acetate (MA) derivatives, the methyl mercury site 
is in a hydrophobic environment compared with 
the mercuric acetate site. 





whole inner strands including the ribbon region, and 1.7 A for 
36 a carbons on all four strands. Nonetheless, the sequence 
alignment brought by this superposition of the two sheets 
revealed a low level of internal homology, with seven pairs of 
equivalent residues (shown in bold) out of 41 aligned a carbons: 

338 HRIQraTRFQP(6)SFNYWS(l)RYVSTRPSI(0)GSronTSPF(10)»LKPIf 395 
402 AVAHTNUVWP(0)SAVYSG(l)TKVBFSQyH(3)D8ASTQTYDS(7 JSWDSI 453 

The three-stranded sheet 3 is formed by two separate polypep- 
tide segments. The C-terminal segment of domain II contributes 
the two-stranded ribbon of p l0 and 0 n , whereas the N-terminal 
segment of this domain contributes strand p x , which is hydro- 
gen-bonded to p u ; 0, is followed by a two-turn helix a 8 and 
an extended chain. 

Figure 4c and d shows in side view and in cross-section that 
the three antiparallel sheets are packed around a triangular 
hydrophobic core. This brings the strand jB I0 on the edge of 
sheet 3 into proximity with strand p 4 on the edge of sheet 1, as 
well as placing the loops at the end of the three p ribbons into 
a region of about 12 A radius at the molecular apex. This domain 
is in contact with helix a 7 of domain I on the face of sheet 3 
(Fig. 4c). 

Domain III. Figure 5 is a ribbon drawing of the strands forming 
the two sheets of the p sandwich. The sheet containing the 
C-terminal strand is in contact with domain I and will be called 
the inner sheet. This domain has the *jelly-roir topology 29 , 
because it can be generated by folding an antiparallel p ribbon 
which starts with p l3 (N terminus) and p^ (C terminus) on the 
inner sheet, and ends in the loop between 0 I8 and p l9 on the 
outer sheet; p l4 is a short excursion from this ribbon and forms 
the fifth antiparallel strand of the outer sheet. In addition, small 
parallel sheets are formed at the edge of the p sandwich through 
hydrogen bonding of strand p l2 to p l6 at the edge of the outer 
sheet, and p x to 0, 3 at the edge of the inner sheet. 
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Distribution of conserved sequences. The core of the beetle toxin 
molecule encompassing the domain interfaces is built from the 
five sequence blocks that are highly conserved throughout the 
5-endotoxin family 1 (Fig. 2o,c). Block 1, located in the beetle 
toxin sequence at residues 189-218, corresponds to the central 
helix (a 5 ) of the bundle in domain I. Block 2, residues 239-305, 
overlaps with the latter half of a 6 , and with a 7 and the 
latter hydrogen-bonds to the edge of the inner sheet in domain 
III before forming part of the three-stranded sheet 3 in domain 
II. Block 3, residues 491-538, overlaps with the latter part of 
0„, where it is hydrogen-bonded to p lf and with the loops 
connecting domains II and III. The remainder of block 3 
together with blocks 4 and 5, namely residues 560-569 and 633 
to the C terminus, respectively, constitute the three buried 
strands of the inner antiparallel sheet in domain III. The high 
degree of conservation of internal residues implies that 
homologous proteins would adopt a similar fold. Using the 
beetle toxin structure as a model, we can therefore propose a 
basis for the insecticidal activity of 5-endotoxins as a family. 

Basis of insecticidal function 

Solubility. The beetle toxin crystals are isomorphous with the 
parasporal crystals 18 * 19 and show the molecular contacts respon- 
sible for solubility behaviour in vivo. Four intermolecular salt 
bridges, Asp 142-Arg 165, Asp 224- Arg 562, Asp 590- Arg 178, 
and Glu223-Lys293, are located at contacts to three different 
neighbouring molecules. Such salt bridges keep the protoxin 
crystals insoluble until exposed to the extreme pHs in the insect 
midgut. 

Proteolytic activation. Pro-5-endotoxins have Af r s of either 
~130K or — 70K. Activation by larval gut proteases removes 
the C-terminal half of the larger protoxins 30 * 31 and cleaves them 
at residue 28 or 29 from the N terminus. The smaller protoxins, 
such as that of the beetle toxin, are processed only at the N 
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FIG. 2 Overview, a. Schematic ribbon 
representation of the beetie toxin 
showing the domain organization. 
Secondary structure assignments are 
given by Yasspa within program O (ref . 
23). The poiypeptide pathway is indi- 
cated by colouring the chain in the rain- 
bow order, from red at the N terminus 
to blue at the C terminus. The three 
domains are: I, a seven-helix bundle 
(upper left); II, a three-sheet assembly 
(bottom); and III, a p sandwich (upper 
right). This and all following illustrations 
of the structure are made with the 
program MOLSCRIPT 47 . b and c, Ca 
trace (stereoview) of the molecule with 
the five conserved sequence blocks 
indicated by small beads at their Ca 
positions. In o the view is as in a, and 
in c It is down the central helix of the 
bundle from the bulky end of the 
molecule; c shows that the central helix 
of domain I and the inner sheet of 
domain III are conserved; b shows that 
the helices at the domain Ml interface 
and the loops at the domain 11-111 inter- 
face are also conserved. Note in c the 
helix packing of six around one in 
domain I. d, The solvent channel in the 
C222 x lattice viewed along the c axis. 
One half of the unit cell thickness is 
shown, containing four molecules. The 
other half of the cell is related to this 
by a two-fold rotation about horizontal 
axes (blue lines) at (|, y, ± J). The stack- 
ing of both layers leaves solvent chan- 
nels that traverse the cell along the c 
direction. The N terminus of the 
molecule (arrow) is accessible from 
these channels. 
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FIG. 3 The seven-helix bundle, a. Helical nets 
showing the position of amino-acid residues along 
the 7 helices: q x (63-79); a 2 (a 2a . 85-98 and 
tt^. 104-117). cr 3 (123-152). a 4 (160-185). a 5 
(193-214), a 6 (222-254) and a? (259-285). The 
cylindrical surface of the helices are cut longi- 
tudinally on the side facing the solvent and flat- 
tened to give a view from the interior of the bundle. 
The top of the drawing corresponds to the bulky 
end of the whole molecule. Owing to tilting of the 
outer helices, different helices are in register verti- 
cally only at a level indicated by two arrows pointed 
at o t and a 7 ; a 5 is the central helix. Dotted curves 
outline the strip of hydrophobic residues down the 
inward surface of the other six helices. b.Ca trace 
(stereoview) for the bundle viewed perpendicular 
to a 5 . The relative tilt of the outer helices to a 5 
and that between adjacent outer heleices are both 
about 20°. The Ca trace is shaded grey over 
helices al to a 3 in the back, striped over helix 
a 5 in the centre, and white over helices a 4. a 5. 
and a 7 in the front, c. Cross-section of the bundle 
at the level indicated by the arrows in a viewed 
from the bulky end of the molecule. The hellical 
backbone is represented by curly ribbons passing 
through the Ca positions. The outer helices are 
positioned roughly hexagonally around the central 
one and tilted relative to it. so the bundle forms 
a left-handed superhelix. The aromatic side chains 
are packed in an edge-to-face fashion. Hydrogen 
bonds are shown for side-chain atoms. 




r a2a 





terminus 19,32 where about 50 residues are removed. The activated 
6-endotoxins show a conserved C-terminus, so-called sequence 
block 5 (ref. 1). Its position as the middle strand of the buried 
P sheet in domain III precludes further processing from the C 
terminus. In fact deletion from this site by 4 to 8 residues results 
in inactive mutants with altered solubility and immuno- 
genicity 30 ' 33 ' 35 . This is not surprising as the inner sheet can be 
expected to play a critical part in the structural integrity and 
stability of the toxins through interaction with the helical bundle. 

At the N-terminal cleavage sites the different protoxin 
sequences show locally similar hydropathy profiles 36 ' 37 , which 
would be consistent with a common topology for the N-terminal 
region of the activated toxins as seen in the helical bundle of 
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the beetle toxin. In crystals of the beetle toxin, the N terminus 
at the start of helix a, borders on a large solvent channel of 
about 30 A diameter that crosses the unit cell along the c direc- 
tion (Fig. 2d). This channel could allow access of sporulation- 
associated proteases to the cleavage site in parasporal crystals 19 . 
Receptor binding. The insecticidal selectivity of 5-endotoxins is 
due to high-affinity binding to specific membrane receptors 7 " 9,38 , 
which in three cases seem to be glycoproteins 38 " 40 . For several 
5-endotoxins the specificity-determining regions have been 
delimited by exchanging sequence segments between closely 
related toxins of differing specificities 13 " 15 . Guided by the loca- 
tion of secondary structures in the beetle toxin, a plaus- 
ible alignment of 6-endotoxin sequences was made for the non- 



819 



© 1QQ1 Nature Pnhlichinn fSrnun 





Domain II 




Sheet 1 P5 

P3 ^. «^Sheet2 



PIO 



PIT 



P9 



Sheet 1 P5 

AP3^, >-^TSheet2 
■ ' - «P6 



ig ^Sheets 
al 

9 



. P'° p.T 
Jg ^Sheets 

al 



FfG. 4 The three sheets of domain II. a, Schematic 
ribbon drawing of sheets 1, 2, and 3 laid side-by- 
side. Each is viewed from the exterior of the 
domain. Note the Greek-key topology of sheets 1 
and 2 and the similarity between their fold, b. 
Hydrogen bonding of the polypeptide backbone for 
the three sheets. The 0 strands are shown by the 
main-chain atoms and by the residue numbers at 
their ends; connecting strands are shown as coils, 
c, Ca trace of the three assembled sheets in 
domain II viewed towards domain I (stereoview). 
The Ca trace is shaded grey over sheet I, striped 
over sheet II, and white over sheet ill. d. Cross- 
section of domain II (stereoview) showing the 
packing of three sheets in a triangle around the 
hydrophobic core. The view is towards domain III. 



conserved regions (ref. 12, and T. C. Hodgman, unpublished 
results). Hence the genetically identified specificity-determining 
regions can be mapped to equivalent positions in the beetle 
toxin structure, and these fall mainly in domain II. For instance, 
the dual specificity of CryllA for Lepidoptera and Diptera, as 
distinct from the Lepidoptera specificity in the closely related 
CryllB, is determined by residues 307-382 of their sequences 14 , 
which corresponds roughly to sheet 1 (Fig. 4a) plus strand 0 6 
in sheet 2 and the loop leading up to p 7 , whereas the Lepidoptera 
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specificity of CryllB is dependent on a longer segment 14 that 
would include both inner strands of sheet 2. Similarly, the 
toxicities of CrylA(a) and CrylA(c) to two lepidopteran insects 
depend on three segments termed x, y and z (ref. 15): amino-acid 
substitutions in y can reduce toxicity by up to 2,000-fold, and 
segments x and y interact in determining specificity. Aligned 
with the beetle toxin structure, segment x corresponds roughly 
to the outer strands p A and 0 5 of sheet 1 and the whole of sheet 
2, including the loop entering 0 1O in sheet 3; y corresponds to 
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Domain III 

FIG, 5 Domain III, schematic ribbon representation of the p sandwich. p 
strands forming the inner sheet are shaded grey. The topology of an 
eight-stranded 'jelly-roll' can be seen by following the p hairpin starting 
with p 13 , p ls and p 23 in the inner sheet, continuing to p x6 and p 22 in the 
outer sheet, then p 17 and p 21 , p^ in the inner sheet, and ending with 0 18 
and /J 19 in the outer sheet. p X4 is an excursion from the hairpin and forms 
a fifth antiparallel strand of the outer sheet. Small parallel p sheets are 
added to one edge of the p sandwich, by hydrogen bonding of p 1 to 0 13 
in the inner sheet and P X2 to p X6 in the outer sheet. Residue numbers in 
the P strands are: p x2 , 502-506; /3 13 , 509-513; /3 14 . 519-525; 0 15 , 
536-541; p xe , 547-554; p X7 , 558-569; p 16 . 573-579; p x9 , 585-591; 
Pto, 604-609; /3 21 . 611-614; p 27 , 619-625; and p 23 , 631-643. 

strand p l0 of sheet 3 and the loop connecting p l0 and p u ; and 
z extends from P u \o the C-terminal activation site. Furthermore, 
the interaction between x and y can be understood in terms of 
the proximity between p 4 on the edge of sheet 1 and p l0 on the 



edge of sheet 3. Although z was inferred 15 to extend into 
domain III, the combined evidence from genetics and receptor- 
binding assays in vitro for Lepidoptera toxins 9,41 correlates 
receptor recognition with sequence variations within domain 
II. We note that the p ribbons from all three sheets terminate 
in loops in a small region on the molecular apex, in a man- 
ner reminiscent of the complementarity-determining region of 
immunoglobins. 

Pore formation. The common mechanism of epithelial cell disrup- 
tion by 5-endotoxins of widely different specificities is believed 
to be the formation of lytic pores of 10 to 20 A diameter in the 
insect membrane 10 . The structure of the beetle toxin displays 
an apparatus for pore formation in the long, hydrophobic and 
amphipathic helices of domain I which could penetrate the 
membrane. Between the crystal structure in which the bouquet- 
like helical bundle internalizes all the hydrophobic surfaces, 
and the unknown pore structure where hydrophobic surfaces 
would be in intimate contact with the membrane lipids, large 
conformation changes must occur. In the absence of a full 
characterization of the pore-forming process, we propose the 
following by extrapolation from the crystal structure. 

The trigger for the conformational changes may be provided 
by receptor binding and the consequent interaction of toxin 
with the membrane bilayer. Membrane insertion follows rapidly, 
so that a major part of the bound 6-endotoxin cannot be dis- 
placed from the brush-border vesicles by other toxins recogniz- 
ing the same receptor sites 7,9 . As domain II and probably its 
apical region are most likely to bind the membrane receptors, 
the helices are expected to insert with the 'domain II end' (see 
Fig. 2a) oriented towards the cytoplasm. If helical hairpins are 
to initiate the membrane penetration, as probably happens for 
colicin 28,42 - 43 , they will probably be linked at the domain II end. 
So either of the helix pairs a 6 -a 7 or a 4 -a 5 could be the likely 
initiator. The a 6 -a 7 pair is favoured because it forms part of 
the conserved interface with domain II and is well positioned 
to sense the receptor binding. On the other hand, helix a 5 is 
the most conserved throughout the family of 5-endotoxins. Point 
mutations in a s reduce toxicity of a Lepidoptera toxin without 
reducing binding to membranes 44 . Proteolysis in the interhelical 
loops at the domain III end, as in the a 3 -a 4 loop 19,32 , may 
facilitate release of the helix pairs from the tertiary structure of 
the bundle. The insertion of a hairpin can create a defect in the 
membrane, allowing the rest of domain I to participate in pore 
formation in a cooperative manner. □ 
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Summary 

Background: Genetically modified (GM) crops that ex- 
press insecticidal protein toxins are an integral part of 
modem agriculture. Proteins produced by Bacillus thur- 
ingiensis (Bt) during sporulation mediate the pathoge- 
nicity of Bt toward a spectrum of insect larvae whose 
breadth depends upon the Bt strain. These transmem- 
brane channel-forming toxins are stored in Bt as crystal- 
line inclusions called Cry proteins. These proteins are 
the active agents used in the majority of biorational 
pesticides and insect-resistant transgenic crops. Though 
Bt toxins are promising as a crop protection alternative 
and are ecologically friendlier than synthetic organic 
pesticides, resistance to Bt toxins by insects is recog- 
nized as a potential limitation to their application. 

Results: We have determined the 2.2 A crystal structure 
of the Cry2Aa protoxin by multiple isomorphous replace- 
ment. This is the first crystal structure of a Cry toxin 
specific to Diptera (mosquitoes and flies) and the first 
structure of a Cry toxin with high activity against larvae 
from two insect orders, Lepidoptera (moths and butter- 
flies) and Diptera. Cry2Aa also provides the first struc- 
ture of the proregion of a Cry toxin that is cleaved to 
generate the membrane-active toxin in the larval gut. 

Conclusions: The crystal structure of Cry2Aa reported 
here, together with chimeric-scanning and domain- 
swapping mutagenesis, defines the putative receptor 
binding epitope on the toxin and so may allow for alter- 
ation of specificity to combat resistance or to minimize 
collateral effects on nontarget species. The putative re- 
ceptor binding epitope of Cry2Aa identified in this study 
differs from that inferred from previous structural studies 
of other Cry toxins. 

Introduction 

The almost 20 million hectares of GM crop fields in North 
America consist of crops engineered for herbicide or 
insect resistance. The genes that confer the latter trait 
come from Bacillus thuringiensis (Bt), a family of Gram- 
positive sporulating soil bacteria that produce para- 
sporal crystals with insecticidal activity. The insecticidal 
activity of particular Bt isolates is directed against nar- 
row spectra of insect larval species, usually within a 

3 Correspondence: stroud@msg.ucsf.edu 

4 Present address: Maxygen, Redwood City, California, 94063. 



single order. Bacterial toxins known as insecticidal crys- 
tal proteins (ICPs) or crystalline (Cry) proteins that are 
sequestered as protoxins in crystalline inclusions after 
sporulation mediate this species-specific pathogenicity 
[1]. The Cry protoxins are ingested, solubilized in the 
larval gut [2, 3], and activated by the removal of an 
amino-terminal segment and a C-terminal segment, the 
size of which depends on the gene or its protoxin [2, 4]. 
The active toxins associate with insect-specific recep- 
tors of gut epithelial cells of the target insect [5] and 
subsequently insert into the cell membrane [6, 7], lead- 
ing to the formation of ion channels [8, 9, 1 0]. This results 
in disruption of the electrochemical balance across the 
basal membrane, gut paralysis, and larval death [11,12, 
13, 14J. The host cadaver serves as growth medium 
for vegetative cells arising from germination of the Bt 
endpspores. 

Species selectivity of Cry proteins is encoded in the 
binding site for the target receptor [5]. Classification of 
the Cry proteins is based on amino acid sequence iden- 
tity [15] and is roughly correlated with the taxonomic 
order of susceptible insect species, spanning species 
of agricultural (Cry1 Lepidoptera, Cry2 Lepidoptera, and 
Cry3 Coleoptera) and public health (Cry2 and Cry4 Dip- 
tera) significance. The structure may help guide muta- 
genesis followed by screening that is directed toward 
the fine tuning of species selectivity in order to design 
insecticides that do not kill nontarget organisms such 
as monarch larvae [1 6]. It also may assist in the elucida- 
tion of the structural basis of resistance to Bt toxins and 
the subsequent generation of novel insecticidal toxins 
for use on Bt-resistant insects [17, 18]. 

Structure-based protein engineering of Cry toxins 
may direct the search for variants with broader suscepti- 
ble species spectra, optimal potency, and stability prop- 
erties. Cry2Aa is among an unusual subset of Cry pro- 
teins possessing broad insect species specificity by 
exhibiting high specific activity against two insect or- 
ders, Lepidoptera and Diptera [19, 20]. It is lethal to 
more lepidopteran species than the Cry1 toxins de- 
ployed against agriculturally important Lepidoptera [21] 
and exhibits a low level of crossresistance in CrylA- 
resistant insects [22]. Also, the mode of action of Cry2Aa 
may be distinct from that of other Cry toxins [23]. Thus, 
it could serve as a platform for the design of Cry toxins 
with broader susceptible species spectra and minimal 
CrylA-derived crossresistance in the field. 

Chimeric-scanning mutagenesis experiments have 
identified disjoint blocks (D and L, see Results and Dis- 
cussion) of the Cry2Aa sequence that separately confer 
specificity against dipteran (D) and lepidopteran (L) spe- 
cies [24, 25]. These experiments also demonstrate that 
maximal activity against lepidopteran species requires 
not only L block residues but also some of the specificity 
determinants of the D residue block. Further, Cry2Ab, 
an 87% sequence identical homolog of Cry2Aa, has 

Key words: Bacillus thuringiensis; delta-endotoxin; Cry2Aa; binding 
epitope; crystal structure; X-ray 
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Table 1 . Data Collection, MIR Phasing, an 


d Structure Refinement Statistics for Cry2Aa 












Native 


U(NOa) 


PtCNS 


Ptle 


NbCI 2 


Ru 1 


Hg 2 


Data Collection (1.08 A) 


Resolution (A) 


2.2 


2.6 


2.5 


2.6 


2.6 


2.5 


2.5 


Unit cell dimensions (A) 


a = b = 85.6 












c = 163.9 














Space group 


P4 3 2,2 














Number of observed reflections (oy - 2.5) 


245,580 


69,703 


139,057 


70,618 


73,949 


113,242 


126,930 


Number of unique reflections 


31,591 


17,370 


20,476 


17,999 


17,455 


19,475 


20,198 


Completeness (%) 


99.3 


89.1 


99.0 


92. 


89.3 


94.7 


97.0 


fU(%) 


5.7 


5.1 


5.7 


4.7 


4.3 


5.4 


6.3 


Phasing/MIR 


Resolution 




2.6 


2.6 


2.6 


4.5 


2.6 


5.25 


Number of sites refined 




5 


6 


7 


6 


5 


3 


Number of reflections (<r F 3) 




16,177 


17,808 


16,516 


3,136 


17,074 


2,419 


R*>(%) 




16 


24 


32 


10 


10 


16 


R«lE» 




.62 


.62 


.59 


.60 


.67 


.62 


Rb*Ul 




.13 


.15 


.20 


.06 


.08 


.09 


Phasing power 




1.1 


1.9 


1.8 


1.4 


0.8 


1.2 






.36 


.39 


.41 


.41 


.30 


.42 


<FOM> ow ^n phawd )< 


.65 (18,677) 















Refinement 



Resolution (A) 


28.0-2.2 


Number of reflections (completeness %) 


31,509(93) 


Rc^[a F = 0] (2.3-2.2 A) 


.18 (.21) 


R*^ [5% test] (2.3-2.2 A) 


.24 (.23) 


Number of non hydrogen atoms 


5,001 


Number of water molecules 


514 


Rms bond distances (A) 


.005 


Rms bond angles (°) 


1.2 


1 [RufNHaMCIj. 




* Para chloromercuri phenol (PCMP). 




3 Individual data set results. 




4 Final number of phased reflections. 





negligible activity against dipteran species and 3- to 
8-fold less activity against certain lepidopteran species 
[25, 26], Hence, Cry2Aa structure and mutagenesis data 
provide the basis for future protein engineering of Cry 
toxins with modified specificity and selectivity profiles. 

To understand the structural determinants of Cry toxin 
specificity, we determined the crystal structure of the 
protoxin of Cry2Aa from Bacillus thuringlensis subsp. 
kurstaki. The complete structure was determined by 
multiple isomorphous replacement and refined to 2.2 A 
resolution. We have identified a candidate toxin receptor 
binding surface that is consistent with available chime- 
ric-scanning mutagenesis data. 

Results and Discussion 

The structure of Cry2Aa from Bacillus thuringiensis 
subsp. kurstaki was determined by multiple isomor- 
phous replacement using six heavy atom derivatives 
and was refined to 2.2 A resolution with R^, = 18% 
(Table 1). The structure of the 633-amino acid protoxin 
contains the N-terminal 49-amino acid peptide that is 
cleaved upon activation and the three domains of what 
will become the mature toxin [27]. The structures of the 
three domains are surprisingly similar in overall topology 
(Figure 1a) to those of the activated toxins Cry3Aa [28] 
and Cry1 Aa [29], suggesting that removal of the activa- 



tion peptide serves to expose regions of the toxin rather 
than alter its conformation. This structural homology is 
also surprising since these toxins have little sequence 
identity to Cry2Aa (20% and 17%, respectively). In the 
mature toxin, the N-terminal domain (residues 1-272) is 
a pore-forming seven-helical bundle (Figure 1d) [1], The 
second domain (residues 273-473) is a receptor binding 
0 prism, a three-fold symmetric arrangement of p 
sheets, each with a Greek key fold (Figure 1 e). The third 
domain (residues 474-633) is implicated in determining 
both larval receptor binding [30, 31] and pore function 
[32] and is a lectin-like C-terminal p sandwich (Figure 11). 

Available chimeric-scanning mutagenesis data [24, 
25] define a candidate toxin-receptor binding surface 
on Cry2Aa that is comprised of a distribution of hy- 
drophobic residues (Ile474-Ala477 from p12a, Val365- 
Leu369 from the p5-p6 loop, and Leu402-Leu404 from 
the p7-p8 loop) across the solvent-exposed surface of 
the p prism and p sandwich domains (Figure 1 b). Proteo- 
lytic activation of the toxin involves the removal of the 
49 N-terminal amino acids and exposes residues com- 
prising this putative toxin-receptor binding surface. Re- 
moval of the 49 amino terminal residues, comprised of 
a0, aOa, and an N-terminal coil, would not affect the 
structure of the seven-helical membrane insertion do- 
main, as seen by comparing the structures of the acti- 
vated toxin CrylAa and that of the protoxin Cry2Aa. 
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Figure 1 . Topology and Solvent Accessible 
Surface of Cry2Aa 

(a) Ribbon diagram, rendered by Midas Plus 
[48], of Cry2Aa. Domain I is shown in ma- 
genta, domain II is shown in blue, and domain 
III is shown in cyan. The N terminus is shown 
in red, while functionally important loops de- 
limiting the putative toxin-receptor binding 
epitope are shown in green. A Cry2Aa inser- 
tion, relative to Cry3Aa and CrylAa, before 
p12 at the N terminus of domain III is shown 
in magenta. Numbered p strands referred to 
in the text are labeled. 

(b) The solvent accessible surface, as calcu- 
lated by GRASP [49], of domains II and III of 
Cry2Aa. The orientation is identical to that 
shown in Figure 1 a. The projection of residue 
hydrophobicity onto this surface is shown in 
color. Portions of the hydrophobic surface 
contributed by residues 474, 476, and 477 are 
shown in cyan, those contributed by residues 
365-369 are shown in blue, those contributed 
by residues 402 and 404 are shown in ma- 
genta, and the remainder of the surface con- 
tributed by hydrophobic residues is shown in 
yellow. The remaining surface that is identi- 
fied as nonhydrophobic Is colored white. Res- 
idue hydrophobicity is as defined by GRASP [49]. The prominent hydrophobic patch is the center of the putative toxin-receptor binding 
epitope. For orientation, the portion of the surface contributed by residue 357 of the £4-^5 loop is shown in red. 

(c) The solvent accessible surface (as calculated by GRASP) of domains II and III of Cry2Aa. The orientation is identical to that shown in 
Figure 1 a. The projection of residue hydrophobicity onto this surface is shown in yellow, while the N terminus is shown in red; the N terminus 
sterically hinders access to the putative toxin-receptor binding epitope. Portions of the surface that are identified as nonhydrophobic are 
colored white. 

(cM) The three domains of Cry2Aa shown in the same orientation as in Figure 1a. Labels with amino acid numbers identify the visible N and 
C termini of each domain in the figures. 




This is also expected since constructs consisting of 
the N-terminal-helical domain of the complete Cry3Ba1 
protoxin (prior to cleavage) are capable of nonreceptor- 
mediated partitioning into lipid bilayers [33], as is the 
activated toxin. 

The structure of Cry2Aa suggests that the N-terminal 
residues should sterically hinder access to the putative 
binding epitope p5-p6 and p7-p8 loops (Figure 1a, 
shown in green) and the exposed parts of domain III 
closest to domain II. Projection of hydrophobicity onto 
the solvent accessible surface of domains II and III re- 
veals an 800 A 2 hydrophobic patch (Figure 1 b) proximal 
to these loops. However, while the structure suggests 
that the 49 N-terminal residues (a0, aOa, and the N-ter- 
minal coil) should sterically hinder access to the putative 
binding epitope, the biological rationale for this function 
is unclear. It is unlikely that Bt possesses a receptor 
with affinity for the activated toxin. Hence, it does not 
seem likely that the N terminus serves to prevent prema- 
ture activation of the toxin within Bt. One simple expla- 
nation is that occlusion of the hydrophobic patch of the 
putative binding epitope prevents nonspecific aggrega- 
tion of the toxin with itself or other host proteins. Another 
explanation is that the N-terminal amino acids play a role 
in the formation of the environmentally stable crystalline 
inclusions. 

The specificity-distinguishing residues are also indi- 
cated by comparison of the Cry2Aa structure with the 
structure of the highly homologous (87% sequence iden- 
tity) Cry2Ab that is inactive against some Cry2Aa target 



species (Figure 2a). Chimeric-scanning mutagenesis 
[24, 25] defines a continuous 106 amino acid block, 
307-412, of specificity-distinguishing residues. (Specifi- 
cally, [25] demonstrated that substitution of residues 
278-340 resulted in loss of dipteran-specific activity in 
Cry2Aa, while [24] demonstrated that substitution of res- 
idues 307-382 conferred dipteran-specific activity to 
Cry2Ab. Thus, in our discussion, we adopt residue 307 
as the N-terminal boundary of the specificity-conferring 
sequence in Cry2Aa.) Within these 106 amino acids, 
there are 23 residues that differ between Cry2Aa and 
Cry2Ab (sequence alignment presented in Figure 5). 
Most of the Cry2Aa-Cry2Ab amino acid differences lie 
within or about the domain \\/\\\ 800 A 2 hydrophobic 
patch (Figure 1 b) and surrounding residues from the P5- 
P6, p7-p8, and p4-p5 loops (Figure 1a). The picture of 
the putative toxin-receptor binding surface that emerges 
is that of an 800 A 2 hydrophobic region surrounded by 
three loops, those joining p4-p5, p5-p6, and p7-p8, 
which are also a part of the putative binding site. The 
three loops contain hydrophilic side chains that may be 
involved in specific hydrogen bonding with the receptor 
and so signal a portion of the site that could be mutated 
both to probe these interactions and to alter specificity. 

The proximity of this surface to solvent-exposed loops 
of the lectin-like domain III is consistent with the finding 
that domain III plays a role in the fine tuning susceptibility 
of different species. This has been demonstrated by 
replacement of domain III [30, 31] to make chimeric 
toxins with altered specificity characteristics. The N-ter- 
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Figure 2. Space-Filling Representation of 
Cry2Aa Specificity-Conferring Residues, De- 
tail of Buried D Block Residues, and Electron 
Density Covering Buried D Block Residues 

(a) Space-filling model of Cry2Aa domains II 
and 111 with the N terminus and membrane- 
inserting domain I removed. The orientation 
reflects a -20° rotation relative to that shown 
in Figure 1 a. The results of chimeric-scanning 
[24, 25] mutagenesis experiments are pro- 
jected onto the van der Waals surface of 
Cry2Aa. The residues colored green and cyan 
are single amino acid differences between 
Cry2Aa and Cry2Ab in block L (residues 341- 
41 2). The residues colored yellow and orange 
are single amino acid differences between 
Cry2Aa and Cry2Ab in block D (residues 307- 
340). Trie bar represents an approximate 1 0 A 
scale. 

(b) Packing of D block residues behind the 
(S4-P5 loop. The £4-05 loop contains L block 
specificity determinants with which the bur- 
ied D block residues interact. 

(c) Electron density for the putative receptor 
binding site covering residues of the (S sheet 
behind the 04-p5 loop. 



minal strand p12a of domain III Is not present In the 
three-dimensional structures of Cry1 Aa or Cry3Aa. The 
turn between this strand and the rest of domain III is 
functionally replaced almost exactly by a loop that con- 
nects p3 and p4 of domain II in the homologous Cry1 Aa 
and Cry3Aa structures (Figures 1a and 3, shown in ma- 




Figure 3. Detail of Ribbon Diagram Overlap of Cry2Aa and Cry1 Aa 
The Cry1 Aa domains have been independently fit to those of Cry2Aa. 
The functionally important loops delimiting the putative toxin-recep- 
tor binding epitope are shown in green (Cry2Aa) and blue (Cry1 Aa). 
The Cry2Aa insertion, relative to Cry3Aa and CrylAa, before p12 
at the N terminus of domain III is shown in magenta, while the 
corresponding loop from CrylAa is shown in cyan (see arrow). 



genta). This functionally conserved p1 2a motif occupies 
the same region of the structure as the p3-p4 turn in 
CrylAa and Cry3Aa f so it implies conservation of a func- 
tional role in protecting the hydrophobic portion of the 
putative receptor binding surface implicated by the ho- 
molog substitutions. 
Chimeric-scanning mutagenesis identifies fairly large 
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Figure 4. Schematic Representation of Chimeric-Scanning Muta- 
genesis Data 

The first and last bands represent the Cry2Aa and Cry2Ab se- 
quences, respectively. The middle bands represent chimeric combi- 
nations in which gray regions correspond to Cry2Ab sequence and 
white regions correspond to Cry2Aa sequence. For all bands, except 
that corresponding to Hyb51 3, the three central vertical bars repre- 
sent amino acids 278, 340, and 412. For Hyb513, the two central 
vertical bars represent amino acids 307 and 382. The activity desig- 
nations represent an approximate log scale. For example, the (+ + +) 
activity designation for chimera DL112 corresponds to an IDa, of 
126 (85.7-187) ng, while the (+) designation for chimera DL115 cor- 
responds to an IDso of 3,200 (1 ,340-51 ,900) ng; the confidence inter- 
vals correspond to 2<r. 
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Figures. Detail Sequence Alignment of 
Cry2Aa and Cry2Ab 

Sequence alignment of the D and L block 
regions of Cry2Aa and Cry2Ab generated us- 
ing ALSCRIPT [51]. In the alignment, identical 
amino acids are unmarked; similar residues 
(as defined by ALSCRIPT) are colored yellow, 
while dissimilar residues are marked green. 
The secondary structure associated with se- 
quence is presented in the lowermost row. 
The block of secondary structure associated 
with D block residues is colored magenta, 
while that associated with L block residues 
is colored cyan. 



regions of the protein sequence that confer differential 
specificity to Diptera and Lepidoptera [25] (Figure 4). In 
Figure 4, the first band represents the sequence of 
Cry2Aa with its high level of activity (+ + +) against both 
Lepidoptera and Diptera. The last band represents the 

Cry2Ab sequence that exhibits negligible activity ( ) 

against Diptera and up to one order of magnitude lower 
activity against Lepidoptera when compared with 
Cry2Aa. The second band (DL1 1 2) represents a Cry2Aa 
chimera that contains the Cry2Ab sequence for the 
block D residues 307-340 (dipteran-specific). This chi- 
mera has negligible activity against Diptera and is sug- 
gested to have reduced activity (at the 1cr level) against 
Lepidoptera, indicating that block D correlates with dip- 
teran specificity. The activity profile of a reverse chimera 
(the third band) [24], in which Cry2Ab contains the block 



D sequence from Cry2Aa, shows a more significant re- 
duction than DL112 against Lepidoptera (of a different 
species) but is only reduced 20-fold toward Diptera ver- 
sus Cry2Aa. Thus antidipteran activity tracks with the D 
block of Cry2Aa. 

The fourth band (DL1 1 5) represents a Cry2Aa chimera 
that contains the Cry2Ab sequence for the dipteran- 
disfavoring D block and for a neighboring region of se- 
quence, the lepidopteran-disfavoring L block (residues 
341-412). The activity profile of this construct against 
both Diptera and Lepidoptera most closely parallels that 
of Cry2Ab, which is consistent with blocks D and L 
encoding essentially all of the differential specificity de- 
terminants. In summary, the differential specificity for 
Diptera in Cry2Aa depends on block D, while that for 
Lepidoptera depends on block L Maximal activity 



Table 2. Solvent Accessible Surface Areas, Contacts within 3.4 A, and Hydrogen Bonds for the Specificity-Conferring 
Residues in Cry2Aa 



Residue 



Exposed Surface (A 2 ) 



Exposed Surface 
Beyond C p (A 2 ) 



Contacts 



Ile307 


6 


4 


Ser309 


26 


26 


Ile311 


1 


0 


Thr314 


7 


7 


Ile318 


91 


89 


Gly324 


78 


0 


Ser334 


5 


5 


Asn336 


6 


6 


Ser337 


0 


0 



Dipteran Specificity-Conferring 



Ser309,Ser343,Gly481 ,(Met483) t (Tyr342) 
Asn341 ,He307,Thr364,(Ser363) 
Cys362,(Arg339),(Asn361 ) 
Ser337,Asn357,Asn336,His358,(Asn359) 
Thr332,(Thr331) 

Leu31 6^sn336,(Phe409),(Gln399) ( (Arg31 5) 

Thr314,Ser334,Ala460,Ala353,(Gry313),(lle351) 

Thr314,Ala353,His358 



Val346 


39 


34 


Leu350 


27 


26 


Thr354 


50 


26 


Asn355 


109 


76 


Leu356 


60 


43 


His358 


43 


14 


Val365 


107 


75 


Ser370 


68 


39 


Thr382 


54 


24 


Ser390 


9 


9 


Gln399 


33 


33 


Ser403 


93 


73 


Cys406 


37 


27 


Ser410 


89 


72 



Lepidopteran Specificity-Conferring 



Tyr342,(Asn303),(Gly344) 

Asn449,lle450 

Glu451 

(Pro457) 

(Ala353) 

Ser31 2,Thr314,Ser337,(Gly31 3) 

{Asn336} 

Pro367,(His21) 

(Asn392),(Thr391) 

Ser329,Thr331 ,Asp383 

Val374,Arg375,Arg405,(Leu404) 

Ser397,Phe398,Cys362 



All data were calculated for the activated toxin using HBPLUS [50]. Entries in the left-most column are the 23 specificity-conferring residues. 
Entries in the right-most column conform to hydrogen bonding geometry, except for those enclosed in parentheses that are van der Waals 
contacts. Bold entries in the right-most column identify specificity-conferring residues also found in the left-most column. 



Structure 
414 



(a) 



GKP*: y-CCC ATG GAT AAT GTA TTG AAT AGT GGA AG-3* 
GKP-7: y-CAA GCT TTA GGT TAA CTT GAA ATG A-3' 



B*mM 

* CZ3— E 




cry2AM 



Otfl OTT2 



cryTM 



cryZAs 



Figure 6. Restriction Maps Detailing the Con- 
struction of Plasmid pSB307 

(a) Nucleotide sequences of the oligonucleo- 
tides GKP-6 and GKP-7. 

(b) Restriction maps of pSB302, pSB304, and 
PSB307. 



against Lepidoptera, as seen in Cry2Aa, still requires 
some contribution from block D (sequence alignment 
presented in Figure 5). 

Figure 2a projects the Cry2Aa/Cry2Ab homolog differ- 
ences onto the van der Waals surface of Cry2Aa (for 
clarity, only domains II and III are shown). In the D block, 
there are nine residues that differ between Cry2Aa and 
Cry2Ab. Surprisingly, most of these are buried. The nota- 
ble exceptions are Ile318 and Gly324 (Asn and Val, re- 
spectively, in Cry2Ab), which are distant from the puta- 
tive binding epitope, and the moderately exposed 
Ser309 (Asn in Cry2Ab) within the putative binding epi- 
tope (Table 2). Ile307 and Ile311 are found packed be- 
hind exposed residues on the putative binding surface. 
Almost half of the variant residues from block D (TTir31 4, 
Ser334, Asn336, and Ser337) are in a cluster that is 
packed behind the p4-p5 loop presented from within 
the 72-residue L block (Figures 2b and 2c). 

Two of these buried variant residues, Thr314 and 
Ser337, make side chain-main chain hydrogen bonds 
with the p4-p5 loop. A third residue, Asn336, makes 
main chain-main chain hydrogen bonds with the p4-p5 
loop, and Thr314 makes side chain-side chain hydrogen 
bonds with Ser334. In the less active homolog, Cry2Ab, 
these residues are replaced with approximately isosteric 
nonhydrogen bonding residues, suggesting that this 
pattern of substitutions abolishes affinity for the dipteran 
receptor (Thr314Ala, Ser334Ala, Asn336Leu, and Ser33- 
7Ala). It is conceivable that the He318Val and Gly324Val 
substitutions are part of a region of the protein that 
interacts only with the receptors) found in dipteran spe- 
cies and shares some components with the putative 
binding epitope that we identify. However, we speculate 
that the same exposed surface area binds to the lepi- 
dopteran and dipteran receptors. In this model, these 
solvent inaccessible residues behind the putative recep- 
tor binding surface may serve to alter the conformation 
of the p4-p5 loop, with its several hydrophilic specificity- 
determining residues. Similar modulation of specificity 
in protein-protein interactions by noncontact residues 
is seen in the context of immunoglobin residues that 
affect conformation of the complementarity-determin- 
ing residues (CDR) at the binding surface [34]. Likewise, 
affinity maturation of a Fab/antigen complex results in 
the optimization of antibody/antigen binding by residues 
15 A from the interaction surface [35], 

The structures of Bt toxins provide a template for 
design and discovery of changes that alter receptor 
targeting in order to either broaden selectivity for better 
field efficacy, prolong the life of existing agents, or avoid 



unwanted effects on nontarget organisms. Resistance 
to Bt toxins is recognized as a potential limitation in 
their application. Early studies concluded that recessive 
genes controlled the inheritance of Bt resistance. How- 
ever, a recent study suggests that Bt resistance can be 
inherited as an incompletely dominant autosomal gene 
[36]. The authors note that such a mechanism of Bt 
resistance inheritance in the field would significantly 
reduce the usefulness of the high dose/refuge strategy 
of resistance management in which some mates are 
not challenged with toxin. Knowledge of any presumed 
modifications in the receptor that cause resistance can 
potentially instruct rational protein engineering of the 
receptor binding surface to yield toxins that might by- 
pass resistance and still bind to the modified receptor 
of resistant insect species. 

Potential collateral effects upon nontarget insect spe- 
cies [36] and effects upon nontarget predatory insects 
that consume target insect species [37] have been at- 
tributed to Bt GM crops. The structures provide a blue- 
print for focused mutagenesis followed by screening to 
select for each specific target species in a particular 
crop, so as to diminish collateral toxicity to nontarget 
species. By shedding light on the molecular basis of 
toxin-host receptor recognition, the structure provides 
a foundation for engineering Bt-based toxin genes that 
may develop broader insect species specificity, species 
selectivity tuned to reduce collateral impact upon non- 
target species, and longer field efficacy. 

Biological Implications 

We have determined the three-dimensional structure of 
the insecticidal toxin Cry2Aa in order to understand the 
structural determinants of toxin specificity. Genetically 
modified (GM) crops that express insecticidal protein 
toxins are an integral part of modem agriculture. Pro- 
teins normally produced by different strains of Bacillus 
thuringlensis (Bt) during sporulation mediate a species- 
specific pathogenicity of Bt toward insect larvae of the 
target species and are the active agents in the majority 
of biorational pesticides and insect-resistant transgenic 
crops. Though promising as a crop protection alterna- 
tive, problems exist with transgenic crops. Bt GM crops 
may pose a threat to nontarget insect species [1 6] or to 
nontarget predatory insects that consume target insect 
species [37]. In addition, resistance to Bt toxins is recog- 
nized as a potential limitation to their application that 
is ecologically friendlier than traditional organic pesti- 
cides. For instance, EPA approval of Bt GM maize was 
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contingent upon the establishment of viable resistance 
management strategies [36]. 

Cry2Aa is among an unusual subset of crystalline (Cry) 
proteins possessing broad insect species specificity by 
exhibiting high specific activity against larvae from two 
insect orders, Lepidoptera and Diptera [24, 25], of ag- 
ricultural and public health significance. Also, the 
Cry2Aa protoxin is significantly smaller (72 kDa) than 
those of the Cry1 proteins (~135 kDa) in the current 
generation of transgenic crops. Since gene size can be a 
limiting factor of protein expression in plants, transgenic 
constructs based upon Cry1 usually express a smaller 
portion of the gene that contains essentially the acti- 
vated toxin. Cry protoxins are presumed to be more 
environmentally stable than the activated toxins; hence, 
transgenic constructs that express the Cry2Aa protoxin 
could deliver higher toxin doses in the field due to 
greater stability [22]. Also, expression of the protoxin 
reduces collateral damage to nontarget insect species 
since it depends on specificity of the host proteases for 
activation [3, 37]. Chloroplast-directed overexpression 
of the Cry2Aa protoxin has been demonstrated and 
shows expression levels equivalent to 2%-3% total sol- 
uble protein in transformed leaves [22]. Such high levels 
of expression, 20- to 30-fold higher than current nuclear 
transgenics, could diminish the opportunity for devel- 
oping resistance by significantly increasing toxin dose 
at the initial encounter with the insect. 

Cry2Ab, an 87% sequence identical homolog of 
Cry2Aa, has negligible activity against dipteran species 
and 3- to 8-fold less activity against certain lepidopteran 
species [25, 26]. Also, there exists a unique body of 
chimeric-scanning mutagenesis data in the Cry2Aa/ 
Cry2Ab system that has identified determinants of spe- 
cies specificity in the amino acid sequence [24, 25]. 
Correlating the structure with chimeric-scanning data 
indicates that the putative receptor binding epitope of 
Cry2Aa lies on the core p sheet and differs from the end 
of the p sheet apical loops of domain II, as suggested 
from structures of the other Cry toxins [28, 29]. Thus, a 
target surface is defined for directed mutagenesis that 
may focus engineering of the toxin either to develop 
broader insect species specificity, species selectivity 
tuned to reduce collateral impact upon nontarget spe- 
cies, or longer field efficacy. Until now, the search for 
new insecticidal bacterial toxins involved collection and 
assay of novel isolates of Bt and other bacteria known 
to have insecticidal activity. Recent reports describe the 
isolation of bacterial species that produce new classes 
of insecticidal toxin [38], These structure data may per- 
mit rational engineering of insecticidal Cry toxins with 
desired characteristics. 

Experimental Procedures 
Cloning of Cry2Aa 

Oligonucleotide primers flanking the coding region of cry2Aa were 
generated based on the published sequence of the gene from Bt 
kurstaki strain HD-1 [26], Primer GKP-6 is a 29-mer that corresponds 
to the N-terminal 26 nucleotides of the coding region (Figure 6a). 
Primer GKP-7 is a 25-mer that corresponds to a fragment overlap- 
ping the Hindi) I site that is located ~350 nucleotides downstream 
from the stop codon (Figure 6a). Plasmid DNA isolated from Bt 
kurstaki HD-1 served as a template for the PCR reaction. The re- 



sulting 21 00 bp fragment was purified and served as the probe used 
to identify the Cry2Aa operon with its accompanying open reading 
frames. The hexamer-primed labeling method was used to incorpo- 
rate ^P-dCTP into the probe. 

Previously, it was indicated that the entire gene, including the 
coding region and the promoter, is present on a 5.0 kb Hindlll frag- 
ment [26] of a plasmid isolated from Bt kurstaki HD-1 . The 3.5-7 kb 
fragments obtained by Hindlll digestion of plasmid DNA isolated 
from Bt kurstaki HD-1 were ligated into an E coii cloning vector, 
pTZ18R (Pharmacia, vecbase accession #VB0071) and were used 
to transform E coii DH5 cells by electroporation. Electroporated 
DH5 cells were plated onto LB-Amp 50 plates containing X-gal and 
IPTG for color selection. The presence of the cry2Aa gene in the 
transformed colonies (white) was confirmed by hybridization of the 
PCR-generated probe. Restriction analysis was used to confirm 
that the clones contained inserts with the cry2Aa gene and also to 
establish the orientation with which the fragment was inserted into 
pTZ1 8R. The results of this analysis revealed that one of the clones 
corresponded to the orientation designated pSB302 (Figure 6b), 
while two clones had the opposite orientation and were designated 
PSB303. pSB304, obtained by deleting the 1.2 kb-BamHI fragment 
(dotted line in Figure 6b), was also transformed into DH5. 

Total protein analysis for proteins produced by E coii strain DH5 
carrying pSB302, pSB303, or pSB304 was performed by SDS-PAGE. 
A protein band of molecular weight 62 kDa, absent in the original 
DH5 cells, was observed In all of the clones examined. The level of 
expression was the highest in those cells bearing pSB304. Most of 
the toxin could be found in the pelletable fraction following sonica- 
tion of the cells. Samples were evaluated for biologicaJ activity by 
bioassay using Manduca sexta as the target insect. All of the clones 
(pSB302 t pSB303, and PSB304) were active with LD W values of 
~500 ppm. 

The pSB304 plasmid retains a unique EcoR1 site, ~200 nucleo- 
tides upstream of the cry2Aa promoter, into which the EcoR1 -linear- 
ized Baciilus cereus vector pBC16.1 (GenBank accession number 
U32369) was cloned (Figure 6b). The resulting clone was used to 
transform E. coii DH5, and clones containing the new plasmid were 
designated pSB307. Confirmation of the identity of the new plasmid 
and determination of the orientation of the pBC1 6.1 insert, with 
respect to the cry2Aa gene, was made by restriction mapping. One 
of the plasmids, pSB307.4, was transformed into Bt cryB (a cry" 
strain) by electroporation. The plasmid content of these isolates 
was verified by restriction mapping. 

Cry2Aa expressed well in Bt cryB cells transformed with 
pSB307.4, and the protein formed crystalline (rhombohedra!) inclu- 
sions. The cells were harvested by centrifugation, washed with wa- 
ter, and lyophilized. Dried cell mass was added to the insect diet and 
fed to M. sexta larvae. The results confirmed that Bt cryB (pSB307.4) 
exhibited high insecticidal activity. 

Protein Expression and Purification 

The plasmid (pSB307.4) containing the Cry2Aa operon, with its ac- 
companying open reading frames, was used to transform the cry" 
strain of Bt [cryB) as previously described [39]. Cry2Aa was purified 
from the crystalline inclusions produced in the cells. Inclusions were 
harvested by cell lysis and centrifugation. Crystalline inclusions 
were washed repeatedly with 0.5 N NaCI to remove proteases and 
were transferred to buffer (10 mM Tris-HCI, 1 mM EDTA [pH 8.0]) 
with 2% mercaptoethanol. Titrating the pH to 1 0.5, using NH 4 OH, 
solubilized the protein from the crystalline inclusion bodies. The 
protein was purified by Sephacryl S300HR column chromatography 
as described [40] and concentrated by ultrafiltration to 1 0 mg ml" 1 . 

Crystallization and Structure Determination 
For recrystallization, hanging drops of the resulting concentrated 
protein (10 \il concentrated protein buffered as described above) 
were equilibrated against wells that contained Tris buffer (10 mM 
Tris-HCI, 1 mM EDTA [pH 8.0]). Crystallization was induced by the 
gradual shift to neutral pH as the mobile NH 3 diffused from the 
drops. Crystals were transferred to storage buffer (50 mM PIPES, 
250 mM NaCI [pH 6.8]) with 2% mercaptoethanol. The resulting 
crystals are in spacegroup P4j2,2; unit cell constants a =» 85.6 A, 
c = 163.9 A. They have one monomer in the asymmetric unit, an 



Structure 
416 



estimated 34% solvent content, and diffract to ~3.0 A using Cu 
X-rays from a rotating anode generator and to 2.0 A at a synchrotron 
source after flash freezing. 

For the collection of data at 100K t the crystals were transferred 
in three steps to a final 20% solution of cryo-protectant (a 1 :1 mixture 
of 1 ,2-propane diol and glycerol) and storage buffer and flash frozen 
in a cold nitrogen stream. X-ray diffraction data were collected at 
SSRL beamline 7.1 using a wavelength of 1 .08 A. Intensity data were 
integrated, scaled, and merged using HKL [41]. The overall Wilson 
B factor (3.0 A < d < 2.2 A) was 14 A 2 . 

De novo phasing was achieved using multiple isomorphous re- 
placement after attempts to find a molecular replacement solution 
to the phase problem employing the available coordinates of Cry3Aa 
and CrylAa were unsuccessful. The heavy atom derivatives (Table 
1) were solved from difference Patterson maps as displayed using 
XtalView [42]. Difference Fourier Inspection for minor sites and re- 
finement of the heavy atom positions, occupancies, and B factors 
was completed in PHASES [43]. The resulting protein electron den- 
sity map was subjected to solvent-flipping density modification, as 
implemented in Solomon [44]. The helical bundle was apparent in 
5 A maps; at 3 A resolution, the correct enantiomorph was clear from 
its stereochemistry. Using CrylAa as the initial building template, 
pofyalanine versions of the helical and jelryroll domains were manu- 
ally positioned using O [45], and the fit was optimized using the reaJ- 
space refinement package ESSENS [46]. Positional and simulated 
annealing refinement were carried out using the maximum likelihood 
target of XPLOR 3.85X [47]. 
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